NSF PAR Search | NSF Public Access Repository

Finite Time Logarithmic Regret Bounds for Self-Tuning Regulation

Singh, R; Mete, A; Kar, A; Kumar, P R (July 2024, Proceedings of the 41st International Conference on Machine Learning)

We establish the first finite-time logarithmic regret bounds for the self-tuning regulation problem. We introduce a modified version of the certainty equivalence algorithm, which we call PIECE, that clips inputs in addition to utilizing probing inputs for exploration. We show that it has a ClogT upper bound on the regret after T time-steps for bounded noise, and Clog3T in the case of sub-Gaussian noise, unlike the LQ problem where logarithmic regret is shown to be not possible. The PIECE algorithm is also designed to address the critical challenge of poor initial transient performance of reinforcement learning algorithms for linear systems. Comparative simulation results illustrate the improved performance of PIECE.

Full Text Available

Search for: All records